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1.  INTRODUCTION 

There  is  an  extensive  literature  on  measures  of  diversity 
within  populations  and  dissimilarity  or  similarity  between  pop¬ 
ulations.  They  have  been  used  in  a  wide  variety  of  studies  in 
anthropology  (Rao,  1948;  Mahalanobis,  Majumdar  and  Rao,  1949; 
Majumdar  and  Rao,  1958;  Rao,  1971a, b,  1977),  in  genetics 
(Cavalli-Sfroza,  1969;  Karlin  et  al,  1979;  Morton  and  Lalouel, 
1973;  Nei,  1978;  Sanghvi,  1953;  Sanghvi  and  Balakrishnan ,  1972), 
in  economics  (Gini,  1912;  Sen,  1973)  in  sociology  (Agrestl  and 
Agresti,  1978)  and  in  biology  (Sokhal  and  Sneath,  1963;  Pielou, 
1975;  Patil  and  Taille,  1979).  A  complete  bibliography  of 
papers  on  measures  of  diversity  and  their  applications  is  com¬ 
piled  by  Dennis  et  al  (1979). 

Most  of  these  measures  are  based  on  heuristic  considera¬ 
tions;  some  are  derived  from  mathematically  well  postulated 
axioms,  while  others  are  constructed  using  possible  models  for 
genetic  and  environmental  mechanisms  causing  differences  between 
individuals  and  populations.  The  object  of  this  paper  is  to 
review  some  of  these  measures  and  to  provide  some  unified  ap¬ 
proaches  for  deriving  them. 

We  consider  a  set  of  populations  {n^}  where  the  individuals 
of  each  population  are  characterized  by  a  set  of  measurements 
X  e  (fl,8),  a  measurable  space.  The  probability  distribution 
function  of  X  in  is  denoted  by  PA  and  the  convex  set 

generated  by  {P± }  is  denoted  by  P.  A  diversity  coefficient 
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(DIVC)  is  a  mapping  from  P  into  the  real  line,  which  reflects  dif¬ 
ferences  between  individuals  (X's)  within  a  population.  We 
denote  the  DIVC  of  tt a  by  (the  symbol  H  is  used  to  indi¬ 
cate  heterogeneity).  A  dissimilarity  coefficient  (DISC)  or  a 
similarity  coefficient  (SIMC)  is  a  mapping  from  P  X  P  into  the 
real  line,  which  reflects  the  differences  or  similarities  be¬ 
tween  populations.  We  denote  a  DISC  between  ir^  and  ir^  by 
D^j  and  a  SIMC  by  S^  . 

2.  COEFFICIENTS  BASED  ON  INTRINSIC 
DIFFERENCES  BETWEEN  INDIVIDUALS 

2.1  General  Theory 

We  start  first  by  choosing  a  non-negative  symmetric  func¬ 
tion  d(X1,X2)  which  is  a  measure  of  difference  between  two 
individuals  with  X  =  X^^  and  X  =  X2,  without  any  reference 
to  the  probability  distributions  of  Xj  and  Xg.  The  choice 
of  d(Xj,X2)  naturally  depends  on  the  nature  of  the  practical 
problem  under  investigation.  We  define  the  DIVC  (diversity  co¬ 
efficient)  of  iTj  as 

Hi  *  |  d(X1,X2)  Vi  (dX:)  Pi  (dX2)  (2.1.1) 

i.e.,  as  the  average  difference  between  two  randomly  drawn  in¬ 
dividuals  from  .  Suppose  that  one  individual  is  drawn  from 
ir  ^  and  another  from  ir^.  Then  the  average  difference  is 
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We  expect  to  be  larger  than  the  average  of  and  H ^ , 

in  which  case  the  DISC  (dissimilarity  coefficient)  between  and 
Uj  may  be  defined  by  what  can  be  termed  as  the  Jensen  difference. 


Dij  -  Hij  '  5  <Hi  +  V' 


(2.1.3) 


The  expression  (2.1.3)  will  be  non-negative  for  any  i  and  j 
iff  d(Xj,Xg)  is  chosen  such  that  the  function  H  defined  on 
P  as  in  (2.1.1)  is  concave.  This  can  be  easily  verified  by  con¬ 


sidering  Pq  e  P  where 


=  A  P.  +  (1  -  A)P . ,  0  <  A  <  1 


(2.1.4) 


and  computing 


Then 


|  dCXj.x^  po  (dxp  po  (dx2) 


A2  Ha  +  (1  -  A)2  H.  +  2  A  (1  -  A)H.  ..  (2.1.5) 


(AH.  +  (1  -  A  )Hj  ) 

2A(  1  -  A)(Hij  H.  Hj)  =  2A(1  -  A)D^j  , 


(2.1.6) 


The  concavity  of  H  ensures  that  .>  0  and  vice-versa. 


2.2  Some  Examples 

(1)  Let  X  e  Rm,  a  real  vector  space  of  m  dimensions 


furnished  with  an  inner  product  (x,y)  =  x'Ay  ,  where  A  is  a 

f 

) 

positive  definite  matrix.  Define 


d(XlfX2)  =  (^].”X2 >  Xj- X2 ).  (2.2.1) 

Let  X  ~  (l^.E^  in  ir^Ci.e.,  X  is  distributed  with  mean  vector 
and  dispersion  matrix  3^).  Then 

H±  =  2  tr  A 

H.  .  *  tr  A  L  +  tr  A  E,  +  «'  A  5..  (2.2.2) 

ij  i  J  ij 


where  tr  stands  for  the  trace  of  a  matrix  and  *  Wf  -  • 

Applying  the  formula  (2.1.3) 


(2.2.3) 


If  Z±  =  E  for  all  i  and  A  •  E_1 ,  (2.2.3)  becomes  the 

n 

Mahal  anobis  D  between  ita  and  ir^. 

(2)  Let  X  *  .CXj,*.««t*jD)  where  x±  can  take  only  a 
finite  number  of  values.  For  instance  x^  may  stand  for  the 
type  of  gene  allele  at  a  given  locus  i  on  a  chromosome.  In 
such  a  case  an  appropriate  measure  of  difference  between  two 
vectors  X^  and  X2  is 


d(X1>X2)  -  m  -  E  5r 


(2.2.4) 


where  6  *  1  if  the  r-th  components  of  X.  and  X«  agree 

r  ±  * 

and  zero  otherwise.  Let  xr  take  kr  different  values  with 
probabilities 
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Note  1.  When  m  =  1,  we  have  a  single  multinomial  and 
the  expression  (2.2.8)  reduces  to  the  Gini-Simpson  index 


where  Pj,...,pk  are  the  cell  probabilities.  [This  measure 
was  introduced  by  Gini  (1912)  and  used  by  Simpson  (1949)  in 
biological  work].  The  properties  of  (2.2.9)  have  been  studied 
by  various  authors  (Bhargava  and  Doyle,  1974;  Bhargava  and 
Uppuluri,  1975;  Agresti  and  Agresti,  1978). 

Note  2.  It  is  seen  that  HL  as  defined  in  (2.2.7)  de¬ 
pends  only  on  the  marginal  distributions  of  xA,  i=l,...,m, 
and  is  additive  with  respect  to  the  characters  examined. 

These  properties  arise  from  the  way  the  difference  function 
(2.2.4)  is  defined.  The  DISC  (2.2.8)  is  specially  useful  in 
evolutionary  studies  as  suggested  by  Nei  (1978). 

Note  3.  We  may  consider  the  joint  distribution  of 
(x, ,...,x_)  as  a  combined  multinomial  with  k =  k, x...xk 

ID)  x  m 

classes  and  apply  the  formula  (2.1.1)  to  measure  diversity. 

In  such  a  case  the  difference  between  two  individuals  takes 
the  value  1  when  all  the  components  x ^  agree  and  the  value 
zero  if  at  least  one  is  different.  This  leads  to  an  expression 
different  from  (2.2.8)  as  the  basic  function  for  assessing  the 
differences  between  individuals  is  not  the  same.  When 

xi . xm  are  independently  distributed, an  explicit  expression 

for  the  DIVC  based  on  the  combined  multinomial  reduces  to 
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H  =  1  -  [1  -  H(1 )  ]  ...  [  1  -  H(m)  ]  (2.2.10) 

where  H(r)  is  the  DIVC  based  on  x  ,  the  r-th  character 

r 

only.  It  may  be  noted  that  the  expression  for  DIVC  given  in 
(2.2.7)  is  H  =  E  H(r)  whether  are  independently  dis¬ 

tributed  or  not. 

2 . 3  Apportionment  of  DIV 

With  the  DIVC  as  defined  by  (2.1.1)  and  using  the  con¬ 
cavity  property,  the  DIV  in  a  mixture  of  populations  can  be 
apportioned  in  a  natural  way  as  between  and  within  populations. 
If  P P^  are  the  distributions  of  X  in  and 

^i , . . . , Xfc  are  the  apriori  probabilities,  then  the  distribu¬ 
tion  in  the  mixture  TrQ  is  A^P^*.  .  .  +*kpk-  It  is  easily 
seen  that 


H 


o 


E  Xi  Ht  *  E  E  Xi  Xj  Dtj 


=  H(w)  +  D(b) 


(2.3.1) 


where  D^  =  -  (Hi+Hj)/2  is  the  DISC  between  tt  ±  and  tt  j  . 

H(w) ,  the  DIV  within  populations,  is  the  weighted  average  of 
the  DIV’s  within  populations  and  D(b),  the  DIS  between 
populations,  is  the  weighted  average  of  the  DISC'S  between 
all  pairs  of  populations.  The  ratio 

G(b)  =  5151  (2.3.2) 

Ho 

is  an  index  of  diversity  between  populations. 
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Different  choices  of  the  difference  function  dCX^.Xg)  may 

give  different  values  to  the  ratio  G(b).  In  Section  3,  we 

shall  discuss  this  problem  in  a  more  general  context. 

Let  us  consider  k  populations  as  in  example  (1)  of 

Section  2.2  where  in  ir^,  the  m-vector  variable  X  ~  (1^,2) 

2 

and  choose  d(X^,Xg)  as  the  Mahalanobis  D  (formula  (2.2.3) 

with  A  =  £-1).  Further  let  no  be  a  mixture  of  ir ^ . ir^ 

with  apriori  probabilities  A  . ..,1^.  Then  using  the  ex¬ 
pressions  (2.2.2),  the  decomposition  (2.3.1)  becomes 


Hq  =  H(w)  +  D(b) 

=  2"*EI  ‘i'j 5ij  1-1 8u 

=  2  m(l+V)  (2.3.3) 


where  =  VL  -  .  Thus  the  diversity  within  populations 

is  2  m  and  the  ratio  G(b)  of  (2.3.2)  is  V  which  is  the 

2 

weighted  combination  of  Mahalanobis  D  's  for  all  pairs  of 
populations.  The  author  has  suggested  (see  Mahalanobis, 
Majumdar  and  Rao,  1949)  the  use  of  an  estimate  of  V  in  the 
selection  of  variables  to  maximize  dissimilarity  between 
populations. 

Let  us  consider  example  (2)  of  Section  2.2  and  denote  by 
irQ,  the  mixture  of  with  apriori  probabilities 

Aj . X^.  In  this  case  (2.3.1)  becomes,  with  as  defined 

in  (2.2.7), 


HQ  *=  mCZ  Xt  (1  -  Ju)  +  l  l  AiAj(|  Jit  +  J-J  -  j  )  1  (2.3.4) 


3 
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J 


which  is  the  decomposition  obtained  by  Nei  (1973)  and 
Chakravarthy  (1974).  The  ratio  G(b)  defined  in  (2.3.2)  is 


G(b)  = 


EU.  X  .(I  J.  .  +  J  J.  .  -  J.  .) 
i  .12  li  2  jj  ij_ 

1  -  I  E  X. X  .  J.  . 

1  J  1J 


(2.3.5) 


The  ratio  (2.3.5)  obtained  by  considering  only  the  two  popula¬ 
tions  iri  and  7Tj  with  equal  prior  probabilities 


J.  .  + 

li 


A1 


-  2  J, 


4  -  J.  .  -  J.  .  -  2 
ii  JJ 


(2.3.6) 


is  the  hybridity  coefficient  of  Morton  (1973)  who  used  it  as 
a  DISC  between  tt  i  and  ir^  in  phylogenetic  studies. 

2.4  Decomposition  of  DIVC  and  DISC 

In  the  method  outlined  in  Section  2.1,  the  basic  expression 
which  determines  the  DIVC  and  DISC  is  the  difference  function 
d(X1,X2).  Any  decomposition  of  d(X1,X2)  such  as 

d(X1,X2)  =  d1(X1,X2)+. . .+dc(X1,X2)  (2.4.1) 

provides  us  with  a  corresponding  decomposition  of  the  DIVC  for 
"i 

Hi  =  .  .+H^C)  (2.4.2) 

where  H,S^  =  E[d  (X,,X_)  |  P.]  ,  and  of  the  DISC  between 

1  S  1  Z  1 

ir  ^  and  tt  j 
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■Dii)+---+Dij>  <2-4-3> 

where  is  obtained  from  and  using  the 

formula  (2.1.3). 

Let  X~(yi,E)in  and  denote  the  eigen  values  of  E 

by  and  the  corresponding  eigen  vectors  by  L^,...,L  . 

If  we  choose 


d(x1(x2)  =  (x1  -  x2)'(x1  -  x2) 

i.e.,  the  simple  Euclidean  distance  in  Rm,  then 

d(X1,X2)  =  [L’(X1  -  X2)]2+.. ,  +  [L^(X1-X2)]2  (2.4.4) 

gives  the  decomposition  of  DIVC  for  ^ 


H.  =  2  tr  Z  =  2  0_  +  ,  ,  ,+2  0 
i  1  m 


(2.4,5) 


which  is  the  familiar  decomposition  of  total  variability  with 
respect  to  m  characters  in  terms  of  principal  components 
(Rao,  1964).  The  corresponding  decomposition  of  DISC  between 
and  itj  is 


V 


V 


(2.4.6) 


where  6^  =  pA  -  ,  the  difference  in  the  mean  vectors  for 

it ^  and  TTj.  However,  if  we  choose 

d(X1,X2)  =  (X1-X2)’  Z_1(X1-X2) 
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i.e.,  the  Mahalanobis  distance  between  two  individuals  then 
we  have  a  different  decomposition 


D.  .  =  $  !  .E-15  .  .  =  -1 
ij  ij  9. 


(L*5. ,)2+. 
1  ij 


•  +  I  <L- 
m 


m 


V 


(2.4.6) 


Note  that  the  eigen  vectors  provide  a  transformation 
of  the  original  measurements  into  uncorrelated  variables,  in 
which  case  the  Mahalanobis  distance  can  be  written  as  the  sum 
of  Mahalanobis  distances  due  to  different  uncorrelated  vari¬ 
ables.  We  can  choose  any  arbitrary  set  of  vectors  M,  ,  .  .  .  ,M 

x  m 

such  that  =  0  for  i/  j  and  Mi^Mi  =  1,  to  obtain  a 

decomposition 


<"i 


ij  ’ °ij  • 


(2.4.7) 


By  combining  some  of  the  D^j's  on  the  right  hand  side  of 
(2.4.7),  we  obtain  decompositions  of  with  a  smaller  number 
of  components. 

If  we  choose 

Mx  -  (a'E"1a)i  E-1o  (2.4.8) 


in  (2.4.7),  where  a  is  the  vector  of  standard  deviations  of 
the  individual  characters  (i.e.,  square  roots  of  diagonal 
elements  of  E),  then 


2  2 
where  D^,  the  residual  after  subtracting  the  D  due  to 

size,  represents  the  distance  due  to  shape  factors  between 

the  two  populations. 


Penrose  (1954)  obtained  a  similar  decomposition  of  Karl 
Pearson's  CRL  (coefficient  of  racial  likeness)  in  terms  of 
size  and  shape.  The  Penrose  indices  do  not  take  into  account 
the  correlations  that  may  exist  between  characters.  For  further 
details  regarding  the  use  of  size  and  particular  shape  factors 
reference  may  be  made  to  Rao  (1962,  1971b). 

2.5  Similarity  Coefficients  (SIMC's) 

Instead  of  a  difference  measure  between  two  individuals, 
it  may  be  natural  to  consider  a  similarity  function  s(X1,X2) 
and  define  S^,Sj  an<*  by  taking  expectations  analogous 

to  H^,  Hj  and  H^j .  Then  the  DIVC  of  may  be  defined  by 


a  suitable  decreasing  function  of  S^,  such  as  1  -  S^  or 
-  log  S^ ,  specially  when  the  range  of  S.^  is  (0,1).  The  DISC 
obtained  by  choosing  =  l-Sj  is 


and  that  by  choosing  =  -  log  Si  is 


Dij  =  I  (log  Si  +  log  Sj)  “  log  Sij 


S.  . 

=  -  log  — ii 


v^-sT 


(2.5.2) 


For  instance,  in  the  second  example  of  Section  2.2,  a  natural 
definition  of  sCX^.Xg)  =  (E5r)/m,  which  lies  in  the  range 
(0,1).  Then 


=  J.  . 


li 


(2.5.3) 


where  are  as  defined  in  (2.2.7),  and  using  (2.5.1)  and 

(2.5.2)  we  have  the  alternative  forms 


+  -V 


(2.5.4) 


Dij  '  '  106  ^7=  ’  (2-5'5) 

/Jii  Jjj 

The  expression  (2.5.4)  is  the  same  as  the  "minimum  genetic 

distance"  (2.2.8)  of  Nei  (1978),  and  (2.5.5)  is  what  he  calls 

the  "standard  genetic  distance". 

Again,  in  the  example  (2),  we  may  define  the  similarity 

function  as  (6, .  .  .5  )  instead  of  (5,  +  ...  +  5  )/m.  The  new 
x  m  i  m  9 

function  has  the  value  unity  when  the  gene  alleles  coincide 
at  all  the  loci  and  zero  otherwise.  In  such  a  case,  when  the 
characters  are  independent , 
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S  -  i(1) 
si  "  Jii 


<(n»)  _ 
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S . .  =  j<m> 

ij 


(2.5.6) 


where  are  as  defined  in  (2.2.5)  and  (2.2.6). 

Taking  logarithms  of  (2.5.6),  the  corresponding  DISC  is 


j!  . 

D  =  -  log  1J 

3  /^IT 


which  Nei  calls  the  "maximum  genetic  distance". 


(2.5.7) 


2.6  A  Functional  Equation 

Consider  a  multinomial  distribution  in  k  classes  with 
probabilities  p  =  (Pj , . . . .p^) »  and  let  H(p)  be  a  DIVC. 
The  maximum  DIV  obtains  when  p  =  (k-1, . . . ,k-1)  ■  e,  say 
(for  evenness),  so  that  we  may  have  the  condition: 


C^:  max  H(p)  =  H(e).  (2.6.1) 

P 

Using  H(p)  as  a  DIVC,  we  can  construct  a  DISC  between  the 
multinomials  defined  by  p  and  e  by  using  (2.1.3), 


Dpe  -  H  (EgS)  -  |  H  (p)  -  |  H  (e)  .  (2.6.2) 

The  larger  the  value  of  H  (p),  the  closer  p  is  to  e, 
which  suggests  an  alternative  way  of  defining  the  DIS  between 
the  populations  defined  by  p  and  e  as  a  quantity  propor¬ 
tional  to 
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max  H  (p)  -  h  (p)  =  H  (e)  -  H  (p).  (2.6.3) 

P 

Equating  (2.6.2)  to  a  constant  multiple  of  (2.6.3)  we  obtain 
the  functional  equation 

H  ~  \  CH(p)+H(e)  ]  =  c  [H(e)-H(p)  ] 

or 

H  (^)  =  (|+c)  H  (e)  +  (i-c)  H  (p).  (2.6.4) 

where  c  is  a  constant.  There  may  be  many  solutions  to  (2.6.4) 
subject  to  the  condition  .  We  shall  impose  some  regularity 
conditions  on  H  (p)  in  order  to  restrict  the  solutions  to  a 
smaller  class: 

C2:  H  (p)  is  symmetric  in  p1,...pk 
C 3:  H  (p)  admits  first  and  second  order  partial 
derivations  with  respect  to  Pi>*-*«Pk_i  and 
the  (k-1)  X  (k-1)  matrix 

H"(p)  -  {4i  %  H  <p>) 

is  continuous  and  not  null  at  p  =  e. 

Of  course  H'(p)  =0  at  p  =  e  in  view  of  the  condition  Cj 
and  the  condition  Cg  ensures  that  the  diversity  measure  is 
locally  sensitive  when  p  deviates  from  e. 

We  shall  show  that  under  the  conditions  ,  Cg  and  Cg, 
the  function  H(p)  satisfying  the  equation  (2.6.4)  is  of  the 
form 

H(p)  -  a  (1-E  p^)  +  b  (2.6.5) 
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where  a^>0  and  b  are  constants,  i.e.,  H  (p)  is  essentially 
the  Gini-Simpson  index. 

(i)  Using  the  condition  Cg,  we  obtain  on  taking  the 
first  and  second  derivaties  of  both  sides  of  (2.6.5) 
with  respect  to  Pi>***»Pk-,i 

\  -  (|  -  c)  H-  (p)  C2.6.6) 

\  =  (i  -  c)  H"  (p)  (2.6.7) 

where  H'  is  a  k-1  vector  and  H"  is  a  (k-l)X(k-l) 
matrix.  Putting  p  =  e  in  (2.6.7) 

i  H"  (e)  =  (|  -  c)  H"  (e)  (2.6.8) 

which  implies  that  c  *  1/4,  using  the  condition  H"(e)  f  0 

(ii)  The  equation  (2.6.7)  becomes 

H"  (^)  -  H”(p) .  (2.6.9) 

Repeated  use  of  (2.6.9)  gives 

H"(p)  =  H"  C2"n(p-e)+e]  -*■  H"(e)  (2.6.10) 

The  equation  (2.6.10)  implies  that  H(p)  is  quadratic 
in  Pi'**‘»Pk_i  which  may  be  written,  using  the 
condition  of  symmetry, 
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H(p)  =  X1  E  pju2  n  pipJ  +  Xg  E  Pi  +  X4 

■  e  p?  +  x2  (i - Pk)2  +  x3  (i  -  pk)  +  x4 


(2.6,11) 


where  all  the  summations  are  taken  from  1  to  k-1 .  The 

condition  Cg  demands  symmetry  with  respect  to  p^ . Pk, 

in  which  case  (2,6.11)  assumes  the  form 


H(p)  =  E  p2  +  p2 


(2.6.12) 


Using  the  condition  C^,  we  find  that  m^<0  in  which 
case  H(p)  is  of  the  form 

a( 1  -  E  p2)  +  b  (2. 


(2.6.13) 


where  a>0,  which  is  required  to  be  proved. 


3.  ENTROPY  AND  INFORMATION 


3.1  Measures  of  Entropy 

A  wide  variety  of  DIVC's  have  been  introduced  through 
the  concept  of  entropy  and  information.  The  general  approach 
in  these  cases  is  basically  different  from  that  of  Section  2.1, 
where  a  function  d  (X^.Xg)  measuring  the  difference  between 
Individuals  X^  and  Xg  is  chosen  first  and  probability  distri¬ 
butions  of  Xj  and  X2  are  used  only  to  find  the  average  of 
d  (X1,Xg).  In  practice,  d  (X^,Xg)  would  be  chosen  to  reflect 
some  intrinsic  dissimilarity  between  individuals  relevant  to  a 
particular  investigation.  On  the  other  hand,  a  measure  of  en- 
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tropy  is  directly  conceived  of  as  a  function  defined  on  the 
space  of  distribution  functions,  satisfying  some  postulates. 
Some  of  the  postulates  are  that  it  is  non-negative,  attains 
the  maximum  for  the  uniform  distribution  and  has  the  minimum 
when  the  distribution  is  degenerate.  Thus  a  measure  of  en¬ 


tropy  is  an  index  of  similarity  of  a  distribution  function 
with  the  uniform  distribution,  and  hence  a  measure  of  DIV. 

We  shall  consider  the  space  of  all  multinomial  distri¬ 
butions  for  simplicity  of  presentation  of  results,  observing 
that  the  formulae  for  the  continuous  case  can  be  obtained  by 
replacing  the  summation  by  the  integral  sign.  We  represent 


the  probabilities  in  the  k  cells  of  a  general  multinomial 


by  p1,...pk 
pil . pik* 


and  for  a  particular  population  tt ^  by 
Mathai  and  Rathie  (1975)  consider  three  general 


forms  for  entropy: 


H  =  ( 1  -  a)-1  log  apf^^/EpS 

r  r 


(3.1.1) 


£ 

H  =  [(Ip“+Br_1  /£Prr  )  -  1]  *  (21-0t-l)  (3.1.2) 

B  6 

H  -  -  Zp/  log  Pr/Eprr  (3.1.3) 

where  all  the  summations  are  taken  from  1  to  k.  When 
8r  *  1  for  all  r  we  have  the  familiar  expressions  intro¬ 
duced  by  Renyi  (1961),  Havrda  and  Charvat  (1967)  and  Shannon 
(1948). 
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* 


All  the  functions  (3.1.1)  -  (3.1.3)  are  non-negative, 
attain  the  maximum  when  are  equal  (maximum  diversity) 

and  are  zero  when  pi  =  1,  p^  =  0,  j^i  (minimum  diversity). 
Ilathai  and  Rathie  (1975)  discuss  the  various  additional  math¬ 
ematical  postulates  which  lead  to  these  functions.  Patil  and 
Taille  (1979)  and  Pielou  (1975)  provide  interpretations  of 
some  of  these  functions  in  the  context  of  ecological  studies. 

The  functions  (3.1.1)  -  (3.1.3)  are  all  concave  and  the 
method  of  Section  2.1  can  be  used  to  construct  a  DISC  between 
ir^  and  ir^.  For  instance,  choosing  (3.1.3)  with  0r  =  1  as  a 
DIVC,  and  a  mixture  irQ  of  populations  and  tTj  with  apriori 
probabilities  A^  and  A,,,  we  have 


IV 

H1  *  "  pir  log  pir 
r—  i 


H 


o 


k 

(*x  Plr+^2  Pjr5  log  (A1  Pir+X2  pjr*  (3,1-4> 


Ho  -  A1  Hi~  X2  Hj 

pir  pir 

A1  ^  Pir  1  d  +A  D  +  A2  ^  Pir  X  D  +X  o 

1  ir  Xlpir  2pjr  2  Xlpir  A2pjr 


(3.1.5) 


which  is  the  information  radius  defined  by  Sibson  and  Jardine 
(1971)  from  other  considerations. 

Similarly,  the  DISC  between  and  wj  obtained  by 
choosing  (3.1.2)  with  0r  =  1  is 
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D.  =  CE(  X,  p.  +  A0  p.  )a-\.  Ep“  -  A_  E  pa  ]  t  ( 21-a-l )  (3.1.6) 
lj  1  hir  2  *jr  1  *ir  2  ^jrJ  v  '  v  ' 


which,  when  a  =  2,  reduced  to  the  Euclidean  distance,  apart 


from  a  constant  multiplier, 


2  AxA2  *(Pir-Pjr)-. 


(3.1.7) 


The  DISC  obtained  by  choosing  (3.1.1)  with  Br  =  1 


Dij  =  log 


I(Xlpir+X2p.1r)CC 

X,  X0  • 

(£P?r>  (Ep?r> 


(3.1.8) 


The  formulae  (3 . 1 . 5)-(3 . 1 .8)  involve  explicitly  the  prior 
probabilities  X^.Ag.  In  many  practical  applications,  it  is 
appropriate  to  choose  ■  X2  *  1/2  to  define  a  DISC  between 
two  populations. 

3.2  Apportionment  of  Diversity 

By  considering  a  mixture  iro  of  populations  » •  •  •  »*„, 
with  prior  probabilities  Xi > • • • * we  can  obtain  a  decomposi¬ 
tion  of  DIV  in  it  ,  based  on  any  choice  of  the  H  functions 

o 

(3.1. l)-(3 . 1.3), 


H  =  E  A  H  +  (H  -  E  X„  H) 
o  r  r  o  r  r 


H  (w)  +  D  (b) 


(3.2.1) 


as  DIV  within  and  DIS  between  populations.  It  may  be  noted 
that  D  (b)  cannot  in  general  be  obtained  as  a  weighted  combin- 
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ation  of  DISC'S  between  all  pairs  of  populations  as  in  (2.3.1) 
for  the  choice  of  DIVC's  derived  by  the  method  of  Section  2.1. 

(It  is,  however,  true  when  H  is  chosen  as  in  (3.1.2)  with 
6r  =  1  and  a  =  2,  in  which  case  it  also  belongs  to  the  class 
of  DIVC's  derived  in  Section  2.1).  The  ratio  G(b)  =  D(b)/HQ 
has  been  used  by  geneticists  as  an  index  of  diversity  between 
populations  compared  to  within.  However,  as  observed  in  Section 
2.3,  its  value  depends  on  the  H  function  chosen.  In  their 
studies  on  diversity  with  respect  to  blood  groups  and  biochem¬ 
ical  markers,  Lewontin  (1972)  used  the  H  function  (3.1.3) 
with  pr  *  1,  and  Nei  (1973)  and  Chakravarthy  (1974)  used  (3.1.2) 
with  a  =  2  and  =  1.  This  raises  the  question  as  to  what  is 

the  optimum  choice  of  a  DIVC  in  a  given  class  {H}  to  study 
the  apportionment  of  DIV  as  between  and  within  populations. 

A  natural  choice  appears  to  be  one  which  maximizes  the  ratio 
G(b)  =  D(b)/HQ  or  minimizes  the  ratio  H(w)/Hq.  Such  "a  choice 
will  depend  on  populations  under  study  and  the  prior  probabilities. 

To  examine  the  extent  to  which  the  optimum  choice  depends 
on  the  population  distributions,  the  following  computations 
were  made  in  the  simple  case  of  two  binomial  populations  with 
equal  prior  probabilities.  The  class  of  H  functions  considered 
Is  a  subclass  of  (3.1.2)  and  (3.1.3), 

where  for  a  =  1,  the  function  is  defined  by  the  limiting  value 
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H(1)  =  -  Px  log  p1  -  p2  log  p2  • 

Table  1  gives  the  values  of  D(b)/HQ  for  different  combinations 

of  the  proportions  for  the  two  binomials.  For  each  combination, 

the  first  entry  corresponds  to  the  value  of  G(b)  for  a  =  1, 

the  second  for  a  =  2,  the  third  for  a  =  2.5,  the  fourth  for  the 

optimum  a,  and  the  fifth  entry  within  brackets  gives  a*  the 

) 

optimum  value  of  a.  The  blanks  for  certain  combinations  indi¬ 
cate  that  the  values  are  the  same  as  for  the  combination  with 
the  complimentary  values  of  (p^.q^),  the  binomial  proportions  of 
the  two  populations.  It  is  seen  that  the  optimum  value  a*  of 
depends  on  the  values  of  p^.q^,  although  it  is  stable  for  a  wide 
range  of  values.  If  p^  and  q^  are  both  small  or  both  large  a+ 
is  small  and  tends  to  zero  as  p^  and  qx  approach  zero  or  unity. 
For  values  of  Px>qx  near  the  boundary  determined  by  the  points 
( .005, . 7) , ( . 01 , . 6) ,  (.05,. 5),  (,1,.4),  (.2, .3),  a*  is  close  to 
unity  which  corresponds  to  the  Shannon  DIVC.  For  other  ranges  of 
(^l’^l^0*  *s  nearly  2.5,  although  a  =  2,  which  corresponds  to 
the  Gini-Simpson  index  is  a  close  competitor. 

The  values  of  the  ratio  G(b)  for  the  heptoglobin  diversity 
in  25  Caucasian  populations  considered  by  Lewontin  (1972)  for  dif¬ 
ferent  values  of  a  are  as  follows: 

a:  1.0  2.0  2.5 

G(b) :  .0209  .0249  .0251 

The  frequency  of  the  heptoglobin  allele  in  these  cases  varied 
between  21%  and  45%  except  in  one  case  it  was  12%.  The  optimum 
a  in  such  cases  is  about  2.5. 
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The  first  four  vertical  entries  correspond  to  a  =  1,2,2. 5  and  a+  respectively. 
The  last  entry  within  brackets  is  a*,  the  optimal  value. 


Table  2  gives  the  values  of  Hq  and  G(b)  for  9  blood 
group  and  7  protein  loci  in  the  case  of  Makiritare  Indians 
from  7  different  villages.  These  were  computed  using  the  data 
kindly  supplied  by  Chakravarthy  (1974),  assuming  equal  popula¬ 
tion  sizes  for  the  villages.  It  is  seen  from  Table  2  that  for 
the  blood  group  loci,  where  p  values  are  in  the  interval 
(30%,  70%),  the  optimum  a  is  2.5;  and  for  the  biochemical  mark¬ 
ers,  where  p  values  are  in  the  interval  (5%,  20%),  the  opti¬ 
mum  a  is  1,  although  the  differences  in  G  values  are  not 
large.  The  value  of  a  =  2.5  comes  out  better  on  the  criterion 
suggested  for  the  choice  of  a  DIVC.  However,  the  value  of 
a  *  2.0  is  a  close  competitor  and  has  other  desirable  proper¬ 
ties  (see  Burbea  and  Rao,  1980). 

4.  DISCRIMINATION  INDEX 

A  general  method  of  constructing  DISC'S  is  through  the 
concept  of  discrimination  between  populations,  i.e.,  the  prob¬ 
ability  with  which  a  given  individual  can  be  identified  as  a 
member  of  one  of  two  populations  to  which  he  possibly  belongs. 

4.1  Overlap  Distance  (Rao,  1948,  1977;  Wald,  1950) 

Let  X  be  a  set  of  measurements  which  has  the  probability 
density  p^(»)  in  and  Pj(*)  in  The  best  decision  rule 

based  on  an  observed  value  x  of  X,  for  discriminating  between 
iTj  and  Wj  with  prior  probabilities  in  the  ratio  1:1  is  to 
assign  x  to 


¥ 
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TABLE  2 


Gene  DIV  of  Makiritare  Indians  in  Seven  Villages 
and  Index  of  DIS  Between  Villages 


Locus 

Ave 

a  = 

1 

a  = 

2 

a  =  2 

.5 

P 

H 

o 

G(b) 

H 

o 

G(b) 

H 

o 

G(b) 

Serological 

Diego 

.196 

.7139 

.1743 

.6303 

.1711 

.6240 

.  1693 

Kidd 

.336 

.9209 

.0250 

.8924 

.0320 

.8899 

.0325 

Rh(  C) 

.418 

.9805 

.0401 

.9731 

.0542 

.9724 

.0554 

P 

.434 

.9874 

.0172 

.9826 

.0232 

.9821 

.0237 

Lewis 

.466 

.9967 

.0791 

.9954 

.1044 

.9952 

.  1191 

Ss 

.470 

.9974 

.0575 

.9964 

.0770 

.9963 

.0786 

Rh(E) 

.563 

.9885 

.0058 

.9841 

.0079 

.9837 

.0081 

MN 

.714 

.8635 

.0263 

.8168 

.0291 

.8128 

.0292 

Duffy 

.736 

.8327 

.0122 

.7772 

.0142 

.7726 

.0166 

Average 

.9202 

.0415 

.8943 

.0448 

.8921 

.0486 

Biochemical 

Ap 

.0557 

.3101 

.0647 

.2104 

.0238 

.2054 

.0213 

Hp 

.424 

.9833 

.0650 

.9769 

.0866 

.9763 

.0884 

Gc 

.820 

.6801 

.0431 

.5904 

.0432 

.5837 

.0427 

PGMj 

.848 

.6148 

.0592 

.5156 

.0504 

.5086 

.  C488 

Lp 

.876 

.5407 

.0084 

.4345 

.0052 

.4275 

.0047 

Alb 

.9857 

.  1081 

.1293 

.0564 

.1719 

.0547 

.  1444 

6PGD 

.991 

.0741 

.2503 

.0357 

.0678 

.0346 

.093  4 

Average 

.4730 

.0561 

.4028 

.0521 

.3987 

.0522 
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population  ir^  if  Pi(x)>p^(x) 

population  if  p. (x)<p.(x)  (4.1.1) 

J  1  J 

and  to  decide  by  tossing  an  unbiased  coin  when  p  (x)*p.(x). 

^  3 

The  probability  of  correct  classifications  for  the  optimum 
decision  rule  is 


p±(x)dx  +  g 


Pj(x)dx 


(4.1.2) 


where  R.  is  the  region  p.  (x)>p.(x)  and  R„,  the  region 

1  1  J  « 

Pj(x)<pi(x).  The  minimum  value  of  (4.2)  is  1/2  which  is 
attained  when  p.(«)  =  p.(*)»  and  the  maximum  is  unity  when  the 

A  J 

supports  of  PA(*)  and  Pj(’)  are  disjoint.  The  more  dissimilar 
the  populations  are,  the  greater  would  be  the  probability  of 
correct  classifications.  Then  we  may  define  the  DISC  between 
it  j  and  TTj  as 


(4.1.3) 


which  is  in  the  range 


It  is  seen  that 


p±(x)  -  Pj(x)  |  dx 


(4.1.4) 


which  is  a  multiple  of  Kolmogorov's  variational  distance  or  city 
block  distance,  which  is  a  special  case  of  the  Minkowski  distance 


[  j  |  p4(x)  -  p j ( x )  I*  dx]  1/,t,  t>l 


(4.1.5) 
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In  the  development  of  decision  theory,  Wald  (1950) 

introduced  the  distance  function  between  n.  and  tt. 

i  3 

Dij  =  "r*  I  j  cPi(x)  "  Pj(x)  ]  dx  | 

R 


(4.1.6) 


where  R  represents  any  arbitrary  region.  The  expression  (4.1.6) 
is  identifiable  as 


1  -  j  min[pi(x),  p j ( x ) ]  dx 


(4.1.7) 


-  p j ( x ) 3  dx 


(4.1.8) 


where  Rx  is  the  region  pjL(x)>  p^(x)  as  in  (4.1.2).  The 
expression  (4.1.8)  is  the  difference  between  the  proportions  of 
correct  and  wrong  classifications  by  using  the  optimum  decision 
rule  (4.1.1).  The  expression  (4.1.7)  may  be  interpreted  as  the 
proportion  of  mismatched  individuals  in  the  two  populations. 

4.2  Quadratic  Differential  Metric  (Rao,  1948) 

Let  us  consider  a  family  of  probability  densities  p(x,0), 

0  €  0,  a  k-vector  parameter  space.  The  Fisher  information 
matrix  at  0  is  M  =  Cm..(0)]  where 

A  %J 

mij(0)  =  {  P  ^  ^  dX-  (4.2.1) 


We  endow  the  space  0  with  the  quadratic  differential  metric 


(4.2.2) 


EE  m.  .(9)  6  9.  6  9. 
ij  i  J 

and  define  the  distance  between  two  points  9  and  9„  as  the 

J.  z 

geodesic  distance  determined  by  (4.2.2).  The  expression  (4.2.2) 
is  a  measure  of  difference  between  two  probability  distributions 
close  to  each  other  and  the  distance  defined  by  it  may  be  useful 
in  evolutionary  studies  where  gradual  changes  take  place  in  a 
population  in  moving  from  state  6^^  to  state  Gg.  In  a  recent 
paper  Atkinson  and  Mitchell  (1980)  have  derived  the  expressions 
for  geodesic  distances  based  on  (4.2.2)  for  well  known  families 
of  distributions. 


4.3  Invariants  of  Jeffreys 

Jeffreys  (1948)  defined  what  are  called  invariants  between 
two  distributions 


lm  -  f  I  Cp^x)]1/"1-  [Pj(x)]1/m  |  m  dx,  m>0 
f  P<(x) 

•o  “  J  dx 


(4.3.1) 


where  the  second  expression  is  the  sum  of  Kullback-Leibler 
information  numbers 


r  Pt(x) 

hj  m  I  »i(x>  1o*  p^TxT  dx-  rji 


(4.3.2) 


When  m  =  1 , 


1 1  =  J  I  Pt(x)  -  Pj(x)  |  dx 


(4.3.3) 


which  is  Kolmogorov's  variational  distance  (overlap  distance  of 
Rao,  1948).  When  m  =  2 

r  _ _ _ _  2 

l2  =  J  t/"pi^*)  “  / Pj(x)  ^  dx 

=  2  (1  -  j  /p^TxT~P^OO  dx)  (4.3.4) 

which  is  extensively  used  by  Matusita  (1957)  in  inference  problems. 
The  expression  (4.3.4)  is  a  function  of  the  Bellinger  distance 

cos-1  |  /pTr^TpTOO  dx.  (4.3.5) 

Rao  and  Varadarajan  (1963)  have  defined  the  Hellinger  DISC  to 
be 

-  loge  |  /pi(x)  p j ( x >  dx.  (4.3.6) 

The  measure  (4.3.5)  was  proposed  by  Bhattacharya  (1946)  as  a 
DISC  between  populations  ir^  and  ttj  and  has  been  used  in  some 
genetic  studies.  The  alternative  expression  (4.3.6)  has  an 
advantage  over  (4.3.5)  in  the  sense  that  it  is  additive  with 
respect  to  characteristics  independently  distributed  in  the 
populations . 

It  is  seen  that  there  are  various  approaches  for  measuring 
DIV  and  DIS  and  some  of  the  controversies  on  the  choice  of 
these  measures  in  practical  investigations  (see  Li,  1978;  Nei, 

1978;  Morton,  1973;  and  Smith,  1977)  may  be  resolved  through  the 
concepts  developed  in  the  present  paper.  Some  further  work  in 
this  direction,  which  is  in  progress,  will  be  reported  elsewhere. 
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